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Abstract 

Background: The cytochrome P450 (CYP) superfamily enables terrestrial plants to adapt to harsh environments. 
CYPs are key enzymes involved in a wide range of metabolic pathways. It is particularly useful to be able to analyse 
the three-dimensional (3D) structure when investigating the interactions between CYPs and their substrates. 
However, only two plant CYP structures have been resolved. In addition, no currently available databases contain 
structural information on plant CYPs and ligands. Fortunately, the 3D structure of CYPs is highly conserved and this 
has made it possible to obtain structural information from template-based modelling (TBM). 

Description: The CYP Structure Interface (CYPSI) is a platform for CYP studies. CYPSI integrated the 3D structures 
for 266 A tholiono CYPs predicted by three TBM methods: BMCD, which we developed specifically for CYP TBM; 
and two well-known web-servers, MUSTER and l-TASSER. After careful template selection and optimization, the 
models built by BMCD were accurate enough for practical application, which we demonstrated using a docking 
example aimed at searching for the CYPs responsible for ABA 8'-hydroxylation. CYPSI also provides extensive 
resources for A thaliana CYP structure and function studies, including 400 PDB entries for solved CYPs, 48 
metabolic pathways associated with A thaliana CYPs, 232 reported CYP ligands and 18 A thaliana CYPs docked 
with ligands (61 complexes in total). In addition, CYPSI also includes the ability to search for similar sequences 
and chemicals. 

Conclusions: CYPSI provides comprehensive structure and function information for A thaliana CYPs, which should 
facilitate investigations into the interactions between CYPs and their substrates. CYPSI has a user-friendly interface, 
which is available at http://bioinfo.cau.edu.cn/CYPSI. 

Keywords: Cytochrome P450, CYP Structure Interface, Template-based modelling, BMCD, ABA S'-hydroxylation, 
CYP707A 



Background 

Cytochrome P450s (CYPs) are heme containing mono- 
oxygenases and are found in all eukaryotes. They cata- 
lyse various chemical reactions, e.g. hydroxylations, 
epoxidations, ring extensions and carbon-carbon bond 
cleavages, and have potential pharmacological and agro- 
nomic applications [1-4], In terrestrial plants, CYPs play 
important roles in response to biotic and abiotic stimuli 
by metabolizing a wide range of small organic com- 
pounds [5-8]. CYPs are also involved in the biosynthesis 
of many structural components [9-13]. 
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The three-dimensional (3D) structures of CYPs may 
provide valuable information that could be used to in- 
vestigate the interactions between CYPs and ligands. To 
date, there are more than 5,100 annotated plant CYPs 
sequences [3,14], but only two have resolved 3D struc- 
tures (CYP74A and CYP74A2) [15,16]. CYP structures 
are difficult to determine by standard X-ray or NMR 
analysis because most of them are membrane-bound 
proteins. Template-based modeling (TBM) could be a 
feasible alternative method for obtaining CYP structure 
information because the 3D structure is highly con- 
served [1]. There are many choices for CYP TBM, e.g. 
the class-dependent sequence alignment strategy for 
CYP TBM [17], SWISS-MODEL [18], MUSTER [19] 
and I-TASSER [20], I-TASSER was found to be the most 
accurate in a recent Critical Assessment of Techniques 
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for Protein Structure Prediction (CASP7-9) [21-23]. 
However, the models generated by these web-servers 
have no heme, the position of which is important when 
investigating the interaction between CYPs and sub- 
strates. We developed a pipeline BMCD specifically 
for CYP TBM (abbreviation of the softwares used: 
PSI-BLAST, MUSCLE, COMPASS and Discovery Stu- 
dio 2.1) [24-27]. 

Most current CYP related resources focus on gene an- 
notation, e.g. the Cytochrome P450 Homepage [28], the 
CYP engineering database (CYPED) [29] and the Fungal 
Cytochrome P450 Database (FCPD) [30]. Although some 
databases collect CYP structure information, e.g. CYPED 
presents all available 3D CYP structures from the Pro- 
tein Data Bank [29] and SuperCYP collect many drug- 
drug interactions and the theoretical models for human 
CYPs [31], neither of them provide further information 
about the interactions between ligands and CYPs 
[29,31]. 

Our study has developed the CYP Structure Interface 
(CYPSI), a platform that provides comprehensive struc- 
ture and function information on all 266 A. thaliana 
CYPs. The models for these CYPs were predicted using 
the BMCD pipeline and the web-servers: MUSTER and 
I-TASSER. CYPSI also provides extensive resources for 
CYPs, including 400 PDB entries for solved CYPs, 48 
metabolic pathways associated with A. thaliana CYPs, 
232 reported CYPs ligands and 18 A. thaliana CYPs 
docked with ligands (61 complexes in total). To demon- 
strate the quality and utility of the 3D structures in 
CYPSI, this paper discusses a case study which searches 
for the candidate CYPs responsible for abscisic acid 
(ABA) S'-hydroxylation. With the implementation of se- 
quence alignment, the BMCD service for template selec- 
tion and a structure similarity search facility for small 
molecules, CYPSI is a comprehensive tool for the inves- 
tigation of plant CYP structures and functions. 

Construction and Content 

Data collection 

The solved CYP structures were collected from the 
Protein Data Bank (http://www.rcsb.org/) [32]. Up to 
December 2011, there were 400 PDB entries associated 
with 76 CYPs (see Additional file 1 for details). 

A total of 290 A. thaliana CYPs isoforms from 272 
CYP genes distributed in 47 CYP families were collected 
from TAIR10 (http://www.arabidopsis.org/) and http:// 
www.p450.kvl.dk/ [33,34], including functional annota- 
tions, protein sequences, coding sequences (CDS) and 
3,000 base pairs (bp) upstream and 3,000 bp down- 
stream of the CDS. 

In addition, 48 metabolic pathways were manually col- 
lected from the PMN database [35] and from the scien- 
tific literature. Pathways clarified in the scientific 



literature were marked with "y" and those that had not 
been clarified were marked with "na" (see Additional file 2). 
A total of 232 ligands in these pathways were col- 
lected from PubChem [36] or built manually by Dis- 
covery Studio 2.1. 

Template-based modelling 

BMCD was specifically developed for CYP TBMs, with 
an emphasis on template selection and sequence align- 
ment. First, profile-profile alignments between the se- 
quence profiles of targets and templates were 
constructed using COMPASS. Next, the five templates 
with the smallest evolutional distances (ED) were 
selected for further TBM. ED was calculated as 
described in reference [37] using the substitution score 
matrix, MIYS960102 [38]. Finally, for each target- 
template pair, three initial models were built using 
MODELLER in Discovery Studio 2.1 (Accelrys Software 
Inc.) [27], using the coenzyme heme copied from the 
template. Of the 15 initial models created for each tar- 
get, the one with the highest Profiles-3D score was 
retained for further refinement. 

The CHARMm force field in Discovery Studio 2.1 was 
used by this project for all processes, including energy 
minimization, molecular dynamic (MD) simulation, the 
docking program (CDOCKER), and for interaction en- 
ergy calculations (See Additional file 1 for details). 

Besides the BMCD, two servers: MUSTER [19] and 
I-TASSER [20], were also used for A. thaliana CYP 
model generation by submitting the sequences for 
A. thaliana CYPs manually. The prediction results indi- 
cated that out of the 279 A. thaliana CYPs longer than 
300 amino acids, 266 CYPs would have complete CYP 
structural domains. 

Profile-3D [39] in Discovery Studio 2.1 was used to 
compare the performance of the three methods and the 
higher the Profile-3D Score Ratio, the better the 3D 
structural quality (Figure 1). Paired £-test [40] showed 
that the Profile-3D Score Ratios for the models pre- 
dicted by BMCD were significantly higher than for 
the models predicted by MUSTER (P < 2.2e-16) or 
I-TASSER (P = 7.1e-13). The Profile-3D Score Ratios 
for the predicted models ranged from 0.75 to 0.95. 
These ratios were close to those of the solved struc- 
tures, which ranged from 0.90 to 1.20. This suggested 
that the model quality for A. thaliana CYPs was good 
enough for practical application. 

A practical application of CYP 3D model 

In order to demonstrate the usefulness of the CYP 3D 
models, a practical application search for CYPs respon- 
sible for ABA S'-hydroxylation is presented below. 

Firstly, ABA was docked to all nine CYPs candidates 
(11 models) proposed by Eiji Nambara et al. [5] to be 
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Figure 1 Models predicted by BMCD have higher Profile-3D Score ratios. The points above the dotted diagonal line represent models 
whose Profile-3D Score ratios given by BMCD were higher than the ratios from MUSTER or l-TASSER. 



responsible for ABA S'-hydroxylation using CDOCKER 
[41]. CYP97A3, CYP97B3, CYP97C1 and CYP714A1 
were excluded from further analysis because they could 
not bind ABA and form a suitable conformation for hy- 
droxylation, as determined by our docking result [data 
not shown]. 

Then we examined the key binding residues of the 
seven initial ABA-CYP complexes for CYP704A2 and six 
CYP707A proteins (Table 1). The binding sites were 
similar in all six initial ABA-CYP707A complexes. For 
example, in ABA-CYP707A3, Lys78 could form a hydro- 
gen bond with ABA; the benzene ring of Phe88 was 
closely parallel to the ring of ABA; Phe248 had a large 
contact area with ABA and Leu319 was located between 
the heme and ABA (Figure 2). However, CYP704A2 
lacked the equivalent CYP707A residues needed to 
firmly bind ABA (Figure 3). 

Secondly, energy minimization and MD simulation 
were performed on the seven candidate docking com- 
plexes. We compared changes in the ABA locations in 
these complexes before and after MD simulation. The 
location of ABA in ABA-CYP704A2 changed consider- 
ably compared to ABA-CYP707A, which indicated that 



this complex was not stable (Table 1, Figure 3 and 
Additional file 3). 

The interaction energy between ABA and CYPs 
decreased significantly after energy minimization or MD 
simulation, which indicated that these steps were neces- 
sary if a more reliable complex was to be obtained be- 
cause a lower interaction energy represents firmer 
binding. It should also be noted that the interaction en- 
ergy for ABA-CYP704A2 was much higher than that of 
ABA-CYP707As (Table 1). Integration of the above 
results, including the binding sites, ABA location and 
the interaction energy, supported the hypothesis that 
CYP704A2 is unlikely to be ABA 8'-hydroxylase. 

CYP707A4 had the lowest catalytic activity for ABA 
S'-hydroxylation among the four CYP707As [5]. Intri- 
guingly, after MD simulation, a hydrogen bond was 
formed between the Tyr74 of CYP707A4 and ABA, 
which did not occur with the other CYP707As (Figure 2 
and Additional file 3), possibly because the equivalent 
residues for the other CYP707As were different from 
CYP707A4. For example, the 74 th residue is Phe for 
CYP707A1 and CYP707A3. The residue and hydrogen 
bond differences at the 74 th site indicated a lower 



Table 1 The Interaction energy (between ABA and receptors) and the Distance (between ABA C8/ and Fe) before and 
after MD simulation 
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5.349 
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CYP707A1 and CYP707A3 have splicing isoforms. "I": the initial docking complex; "M": the complex after three steps of "Minimization"; "MDS a ": the average 
interaction energy of the conformations following MD simulation; "MDS b ": the last conformation following MD simulation. 
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Figure 2 ABA-CYP707As complexes. ABA is shown in green and the key residues are shown in "scaled ball and stick" style. Residues: Lys78, 

Phe88, Phe (around the 245th site) and He/Leu (around the 315th site) are important for ABA localization in CYP707As, while residue Tyr74/Phe74 

is important for the location of Phe88. The Fe of the heme and the C8 of ABA are shown in yellow. The 74 th site is shown in brown, 
k J 



catalytic activity for CYP707A4 during ABA 8'-hydroxyl- 
ation, which is consistent with previous results [5]. 

In summary, the docking results suggested that many 
potential CYPs and key residues should be prioritised 
for further validation studies (Table 1) and that the 
results have provided valuable insights into the mechan- 
ism behind ABA S'-hydroxylation that need further 
investigation. 

CYPSI database construction 

CYPSI was designed as a relational database using a 
typical LAMP (Linux, Apache, MySQL and Perl) plat- 
form aided by JavaScript. An overview of the scheme 
behind CYPSI is shown in Figure 4 and the relation- 
ship among the MySQL tables is shown in Additional 
file 4. Currently, CYPSI contains six categories of data: 
solved CYP structures, A. thaliana CYP sequences, 
predicted 3D structures for A. thaliana CYPs, related 
literature, metabolic pathways for A. thaliana CYPs 
and related ligands. In addition, the 18 CYPs that 
docked with their ligands are also included (see 
Additional file 5). 

Hyperlinks to PDB, TAIR, UniProt [42] and PubMed 
are provided. Some useful tools are also integrated into 



CYPSI to facilitate the browsing and search functions, 
including sequence alignment, a search function for che- 
micals with a similar structure and 3D structure anima- 
tion using Jmol [43]. 

Utility 

Solved CYPs structures 

CYPSI contains 689 solved CYP structures associated 
with 400 PDB entries and provides comprehensive infor- 
mation on protein sequences, secondary structures, 
ligands and the interactions between ligands and recep- 
tors [44,45]. In addition, hyperlinks to PDB, UniProt and 
PubMed are also provided. For those who wish to per- 
form homology modelling of CYPs, 76 high quality CYP 
structures, marked with "Recommended" in the "Tem- 
plate" field, are provided (Figure 5). 

A. thaliana CYPs models 

The predicted 3D models for 266 A. thaliana CYPs are 
a key feature of CYPSI. Taking CYP707A1 as an example 
(Figure 6), the best predicted 3D models by the three 
methods (BMCD, I-TASSER and MUSTER) are shown 
in a table, which can be used for further research. The 
model built by BMCD (in the red box) is recommended 
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Figure 3 ABA-CYP707A3 and ABA-CYP7074A2 complexes. Figures whose AGI names end with "D" represent the last conformation 
following MD simulation for 50 ps; otherwise the name represents the initial docking complex. The key residues close to ABA are shown in using 
the ball and stick style. The hydrogen bonds between ABA and residues of the protein are shown by green dotted lines and annotated with 
bright green words. 



since it is specifically designed for CYP structure model- 
ling and has been shown to have the best perform- 
ance. Other initial models predicted by the three 
methods can be found following the raw data link. 
The parameters for TBM are provided, including the 
template, sequence alignment and sequence identity. 
In order to evaluate the quality of the predicted 
structure models, the estimated RMSD (in the dark 
red box), based on the ED of the target and template 
and the Profile-3D score (in the blue box), are 
shown. Additionally, links to the metabolic pathways, 
ligands and docking complexes are supplied if they 
are in the CYPSI database (located at the lower right 
corner). 

Metabolic pathways 

Another feature of CYPSI is the comprehensive collec- 
tion of metabolic pathways and ligands associated with 
A. thaliana CYPs. Around 70 A. thaliana CYPs were ex- 
perimentally investigated, 50 of which have clear func- 
tions that are associated with 48 metabolic pathways 
(see Additional file 2). Figure 7 shows an example page 
for the ABA catabolic pathway. 



Search capabilities 

Besides the ability to browse the data shown above, 
CYPSI also provides three search capabilities: by key- 
words, by chemical structures and by protein sequences. 

From the search box located at the upper right hand 
corner of the web-page, users can search for information 
using the keywords: Arabidopsis Genome Initiative 
(AGI), PDB IDs, CYP families and pathways. 

Figure 8 shows the webpage for chemical structure 
similarity searches using ChemmineR version 1.4.0 [46]. 
Users can construct molecular structures online using 
JME editor (http://www.molinspiration.com/jme/) or 
submit them in "sdf ' format. Version 2.2.20 of the NCBI 
BLAST algorithm [24] is used for sequence similarity 
searches (see Additional file 6). In general, CYPs are 
multi-function enzymes and may have many substrates. 
In combination with ChemmineR and BLAST, CYPSI 
could be used to build links between the ligands and 
sequences of CYPs. 

BMCD server 

In CYPSI, the BMCD server is used for template selec- 
tion and sequence alignment (Additional file 7). Users 
only need to submit the target CYP sequence and the 
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Figure 4 The CYPSI frame. The raw data are shown in light blue; the processed data are shown in blue; the utilized tools are shown in 
green and the data resources are shown in red. The BMCD pipeline for CYP structure modelling was developed as part of this study. 
"Arabidopsis CYPs": the A tholiona CYP sequences. "CYPs structures": the solved CYP structures. 'Templates": the structures recommended as 
templates for BMCD. "Prediction Structures" were generated by BMCD, l-TASSER and MUSTER. Metabolic "Pathways" were obtained from the 
PMN database and relevant scientific literature. "Ligands" were collected from "PubChem" or built manually. "Sequences" include the protein 
sequences of the A tholiona CYPs and solved CYPs. The "Docked Complexes" were generated by CDOCKER software. In addition, BLAST for 
sequence alignment and ChemmineR for identifying chemicals with similar structures could be used to discover the relationships between CYPs 
and ligands. 
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Figure 5 Structure quality evaluation. There are 7 PDB entries and 14 structures associated with CYP74A. "3DSI: A" was selected as the 
template since the "Quality Score" of this complex was the highest, based on structural completeness and the "Profile-3D Score Ratio" (labelled by 
the red box). 
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Figure 6 View of the CYP models screen. 



results will feedback in a few minutes. The sequence 
alignments given by BMCD can be utilized directly by 
Discovery Studio 2.1 for TBM. 

Discussion 

To facilitate the study of plant CYPs, we have con- 
structed the CYPSI platform, which contains compre- 
hensive information on CYP sequences, structures, 
ligands and functions. Notably, all A. thaliana CYP 3D 



models were predicted using the BMCD pipeline and 
preliminary refinements have been made, which is par- 
ticularly useful when investigating CYP structures and 
functions. In general, there are four steps involved in 
TBM: template selection and sequence alignment, model 
construction, model refinement and model validation. 

The quality of the template is a key factor that deter- 
mines the quality of the predicted models. Prior to 
TBM, a potential template was carefully selected, taking 
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Figure 7 View of the ABA metabolic pathways screen. These pathways were collected from the scientific literature with their clarified 
function marked with a "y" in tne "Credibility" field. 
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Figure 8 View of the chemical similarity search screen. Users can search for a ligand using keywords or structures. Chemical similarity 
searching is based on ChemmineR. The score is the Tanimoto coefficient. 



into consideration the completeness of the structure, 
resolution, presence of a substrate, and the Profile-3D 
Score. 

CYP sequences are highly diverse and it is hard to find 
the most suitable template and obtain the correct se- 
quence alignment for TBM [1,17,47]. We developed 
BMCD for CYP structure modelling and used the 
profile-profile alignment by COMPASS and ED to evalu- 
ate the similarities between templates and targets so that 
the best template is selected. In addition, most models 
generated by BMCD are based on a single template as 
multiple templates may result in considerable structural 
errors [21,23]. 

The recommended BMCD models need further refine- 
ment, which is even more difficult to control than tem- 
plate selection and sequence alignment [18,21], Energy 
minimization and MD simulation are the main methods 
used for molecular refinement. However, in general, it is 
difficult to improve the accuracy of the models using 
these methods [18] as the force fields utilized at present 
are not accurate enough. For example, in the case of 
CYP74A modelling (Additional file 8), I-TASSER utilized 
a special force field to refine the models. However, it 
performed even worse than MUSTER in terms of RMSD 
and TM-score [48]. We found that many models, follow- 
ing energy minimization, were worse than the initial 
BMCD models, as evaluated by the Profile-3D ratio. 
Therefore, we only refined the residues around the coen- 
zyme heme, which is essential for the study of CYP and 
ligand interactions. 



Despite there being many defects in the field of struc- 
ture modelling, the CYP models in CYPSI could still be 
very useful for experimental researchers. In the practical 
application case study, which searched for CYPs respon- 
sible for ABA 8-hydroxylation, although the sequence 
identities of the CYP707A-template pairs were around 
30%, which is theoretically too low to build a high- 
quality homology model, the docking and MD simula- 
tion results coincided well with previous experimental 
results. These results also identified potential residues 
for ABA binding, which should help reveal the possible 
catalytic mechanism involved. However, conformational 
errors in these models are inevitable. Residues that are 
close to a ligand may affect the final docking result, so 
softwares that can cope with both ligand and protein flex 
are recommended for ligand docking, e.g. AutoDock 
[49]. Further energy minimization or MD simulation 
methods are recommended so that more comprehensive 
and reliable information about the enzyme-ligand com- 
plex can be obtained. 

Conclusions 

CYPSI was constructed as a comprehensive platform, in- 
tegrating sequences, structures, ligands and functional 
information for CYPs. In addition, it also provides useful 
tools and resources for CYP structural and functional 
investigations. The recommended models in CYPSI 
could be used directly for substrate docking and these 
enzyme-ligand complexes could provide valuable 
insights for experimental scientists. Further development 



Zhang et al. BMC Bioinformatics 2012, 13:332 
http://www.biomedcentral.com/1471-2105/13/332 



Page 9 of 10 



of CYPSI will lead to the identification of more enzyme- 
ligand complexes. 

Availability and requirements 

The database is available at http://bioinfoxau.edu.cn/ 
CYPSI, which is compatible with most modern web 
browsers. All the data in CYPSI are downloadable and 
freely available to the academic community. 

Additional files 
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